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ON-DEMAND TRANSFER ENGINE transfer requires communication with the centrally located 

DMA controller 910, 912. Because the conventional DMA 

This application claims priority from U.S. Provisional controller is centrally located, access may be limiting to 

Application Ser. No. 60/065,855 entitled "Multipurpose certain applications transferring large amounts of data. 

Digital Signal Processing System" filed on Nov. 14, 1997, 5 Moreover, as discussed, applications transferring blocks of 

the specification of which is hereby expressly incorporated data which have a variable length (e.g., some audio 

herein by reference. applications) require arbitration for the PCI bus 140 and 

communication with the DMA controller by the requesting 

BACKGROUND OF THE INVENTION device to reset the block length before each data transfer, 

^ T- ij f *t- I 10 potentially wasting time, increasing U-aflBc on the PCI bus 

1. Field of the Invention c j j 

140, decreasing efiBciency m the data transfer, and expend- 

This invention relates generaUy to a memory transfer ing valuable MIP (million instruction per second) capacity in 

device. More particularly, it relates to a memory transfer requesting device. Thus, management of the data bufiTer 

device allowing a large number of transfer blocks to be ^ ^e U-ansferred is quite limited and does not offer much 

passed over a Peripheral Component Interconnect (PCI) bus flgxibiUty to the user in a DMA controller-based system, 

in a personal computer. Many conventional agents such as an IDE hard disk 

2. Background of Related Art controller or a SCSI controller have been implemented to 
In traditional Industry Standard Architecture (ISA) based use one or two channels of a DMA controller. However, 

personal computing systems, a Direct memory Access today's computing advances are becoming limited by the 
(DMA) controller is responsible for transferring data 20 relatively small number of block transfer channels made 
between host system memory and peripheral input/output available by conventional DMA controllers. For instance, 
(1/0) devices, e.g., a floppy disk, a hard drive, an audio hardware accelerated muhimedia applications would benefit 
device, etc. greatly from the ability to transfer more than 7 channels (i.e., 
FIG. 9 shows a conventional personal computer (PC) data streams) between host memory and peripherals avail- 
based system including a host processor 906, and a plurality able using today's technology. 

of peripheral devices 902-904. A DMA controller 910 in There is thus a need for a more versatile and distributed 

communication with a PCI bus 140 through the PCI to ISA apparatus and method for allowing the transfer of more than 

bridge 907 facilitates the transfer of blocks of data to and 7 data streams in a personal computer (PC) related applica- 

from peripheral to peripheral or host to peripheral. tion. 

A conventional DMA controller is typically capable of 

handling a maximum of only four block transfer channels in SUMMARY OF THE INVENTION 

a single DMA controller mode. One such conventional in accordance with the principles of the present invention, 

DMA controller is a Model 8237 available from Intel and ^ ^lock memory transfer module comprises a start address 

found m many personal computers. In enlarged systems, a ^ ^^^^y. transferred. The start address is 

secondary DMA controller 912 may be mcluded m a master- maintained in memory of a first device, while a length of the 

slave configuration to the master DMA controller 910 to y^^^^^ memory to be transferred is maintained in memory 

provide a total of up to 7 data stream transfer channels. ^ ^^^^ ^^^.^^ ^^^^^^^ ^^^^ g^^t d^^i^ 

no 10 shows the ccntraUy located input/output (I/O) Amethodof transferring a large plurality of blocks of data 

mapped registers defined for each channel m a DMA con- ^ ^^^^ ^ ^^^^^^^^ ^^^^^^^^ accordance with 

troUer 910, 912. These registers are typically programmed ^^^^^^ ^^^^^^ ^^^^^^ ^^^^^^^ comprises distribut- 

only by the host 906. ^ plurality of data transfer engines among a respective 

Typical registers in a DMA controUer 910, 912 are a piuraUty of devices connected to a data bus, each data 

16-bit host buffer address (e.g., source start address) register transfer engine including a length of a respective at least one 

940, a destination start address register 942, a 16-bit transfer 45 of t^e plurality of blocks of data. A centralized data buffer 

count (e.g., byte count) register 944, and perhaps even an ^ maintained relating to one of a source and destination of 

8-page buffer (not shown). The conventional DMA control- ^^^^ piuraUty of blocks of data to be transferred. Each 

ler 910, 912 is programmed with a value of the source start plurality of blocks of data is transferred over a 

address 940, the destination start address 942, and the length separate one of the plurality of data transfer channels based 

of the data block to be transferred (byte count) 944 for each 50 qh length piuraUty of blocks of data established by 

of the 7 data transfer channels. ga^h of the distributed plurality of data Uansfer engines. 

To initiate a data transfer, a host device must program 

each of the source start address 940, the destination start BRIEF DESCRIPTION OF THE DRAWINGS 

addr^ 942, and the byte coimt 944, and. whenever the p^^^^^ advantages of the present invention will 

penpheral desires to transfer data send a request to the 55 become apparent to those skiUed in the art from the follow- 

DMA controUer 910, 912 to miUa te the data transfer^To ^^^/^^^ ^.^^ ^^^^^^^ ^ 

transfer buffered blocks of data relatmg to a continual data e> 

stream, particulariy buffered blocks of data having a variable FIG. 1 shows a computer system mcludmg one or more 

length, the byte count register 944 relating to the appropriate peripherals having an on-demand transfer (ODT) engme m 

DMA channel must be programmed before the transfer of eo accordance with the pnnciples of the present invention, 

each block of data. Unfortimately. the time required for FIG- 2 shows the contents of a memory block within the 

communication over the PCI bus 140 to affect the appro- PC system, e.g., in or relating to the host processor, in 

priate change in the length of the data block (i.e., to update accordance with the principles of the present invention, 

the byte count register 944) limits the total amount of data FIGS. 3A and 3A(1) show a circular, dynamic stream 

which may be transferred in any given amount of time. 55 interrupt queue in the memory block shown in FIG. 2. 

Although the centralized concept of a DMA controller FIG. 3B shows a stream pointer buffer in the memory 

provides the abihty to transfer as many as 7 data blocks, the block shown in FIG. 2. 
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FIG. 3C shows one of up to 128 data stream cyclic buffers accessed by a host processor at speeds approaching that of 

in the memory block shown in FIG. 2. the processor's full native bus speed. It is important to note 

FIG. 4 shows in more detaU an on-demand transfer (ODT) that all read and write transfers over the PCI bus are burst 

engine shown in HG. 1. transfers. 

FIG. 5A shows a stream request queue in the ODT engine ^ The length of the burst is negotiated between the initiator 

shown in FIG. 4. and target devices and may be of any length. 

HG. 5B shows a stream parameter table in the ODT 1° the disclosed embodiment, the ODT engine is situated 

engine shown in FIG. 4. between PCI Bus Interface Logic and multi-ported random 

171^ . * J * . ui I • *u rtriT access memory (RAM) shared by two DSPs. 

FIG. 5C shows a stream data storage block m the ODT in ^ . ... ^ - .t^^l 

engine shown in FIG 4 '"The PCI bus, unhke the convenUonal ISA bus, has the 

CTi- ^ u .1. I J A .1 • . cxnr- A- capability for peer-to-peer transfers. In a peer-to-peer 

HG^ 6 shows the status and control registers of FIG. 4 m ^^^^^^^^ ^^^^^ ^„ ^^^^^ ^j^^^^j^ 

more deUil. another agent on the bus. The capabilities of PCI bus has 

HGS. 7 A, 7A(1A) and 7A(1B) show the ODT system and enabled the development of a distributed data transfer archi- 

control register of FIG. 6 in more detail. 15 tecture including what is referred to herein as an on-demand 

HGS. 7B and 7B(1A), 7B(1B), 7B(2), 7B(3A), 7B(3B) to u-ansfer engine in each relevant peripheral which wiU trans- 

7B(5) show the ODT transfer status and control register of fer blocks of data. 

FIG. 6 in more detail. In x\:as distributed architechire, any agent that requires 

FIGS. 7C and 7C(1) show the host peripheral queue depth transfer of data to or from the host memory or to or from a 

register of FIG. 6 in more detail. peer agent preferably defines required block data capabilities 

FIGS. 7D, 7D(1) and 7D(2) show the peripheral stream consistent with the needs of the agent. For example: a hard 

pointers register of FIG. 6 in more detail. disk controller may require only one or two block transfer 

HGS. 7E and 7E(1A), 7E(1B), 7E(1B), 7E(2A), 7E(2B) channels for data transfer, whereas an audio accelerator for 

to 7E(5) show the ODT stream parameter table of FIG. 6 in ^5 m^iltimedia applications may require as many as 8 or many 

more detail. more block transfer chaimels. Other multimedia applications 

HGS. 7f! 7F(1) and 7F(2) show the ODT's host interrupt which can benefit fiom a high bandwidth data transfer 

pointer registers of FIG. 6 in more detail. capabihty include MPEG decoders and video accelerators. 

OA J on u .-a t ■. Conventional DMA architecture is not only limited as to the 

HGS 8A and 8B show an operative flow of reg^ter numberofavaUable data transfer chamiels. but also becomes 

mformation m the disclosed ODT engme constructed in 30 , . n i * j j • *t. 

, ... ... f . . qiute cumbersome as a centrally located device as the 

accordance with the pnnciples of the present invention. ^. fj*. c u % - ♦no 

^ . , , numberof data transfer channels mere ases, e.g., up to 128 as 

HG. 9 shows a conventional personal computer (PC) ^^e provided by the disclosed embodiment, 

based system including a host processor and a pluraUty of m present invention defines a scaleable architecture, i.e., 

peripheral devices. qj^^ ^^^^^ ^^^^^ targeted for use in any data transfer 

no. 10 shows the basic registers in a DMA controller appUcation. An ODT engine in accordance with the prin- 

relating to each data transfer channel. ^iples of the present invention provides many features that 

DETAILED DESCRIFHON OF ILLUSTRAHVE ^w^lMc using conventional DMA con^ollers, 

EMBODIMENTS including the ability to support large numbers of block 

^ transfer channels. 

A motivation for development of the ODT engine as disclosed embodiment of an ODT engine is a scale- 
disclosed herein is the proliferation of new modem and able data transfer module that can support the transfer across 
multimedia applications surrounding "Direct-X" function a PCI bus of anywhere from 1 to 128 (or more) independent 
calls in Microsoft Windows™ operating system. The ODT data streams or block data transfer channels for high band- 
engine provides a maximum amount of flexibility for a host ^ijiij applications. The data can be transferred from host to 
and any agent to manage transfers across the PCI bus with agent or agent to agent. Each of these 128 streams of data 
the smallest impact to processing "million instructions per can be of any arbitrary data type, e.g., stereo audio samples, 
second" (MIPS) as well as memory relating to both the host voice samples, modem data, modem bulk delay data, filter 
and the agents. coefficients, command control data, and/or DSP program 

Most personal computers (PCs) are conventionally code, 

equipped with a Peripheral Component Interconnect (PCI) jhe disclosed ODT engine includes a set of registers that 

bus. The PCI bus is a versatile bus over which any agent are preferably located in a shared memory location which is 

connected to the PCI bus can acquire ownership of the bus. accessible by the host and/or any relevant peripheral agent 

The PCI bus is currently a best candidate bus to provide (e.g., a Digital Signal Processor (DSP)). Preferably, as in the 

access to system resources in a burst mode with low pro- 55 disclosed embodiment, the shared memory location is I/O 

cesser overhead. The PCI bus standard was developed in mapped into host I/O memory space, 

response to a marketplace which was becoming crowded All pertinent channel information, i.e., start address, word 

with various permutations of local bus architectures imple- count, and block coimt for each channel, is programmable, 

mented in short-sighted fashions. However, as will be described in more detail, the block count 

The first release of the PCI bus specification, version 1.0, 60 for each data transfer block is maintained in a separate 

became available on Jun. 22, 1992; Revision 2.0 became memory location, e.g., in the ODT of the relevant peripheral 

available in April of 1993, and Revision 2.1 of the specifi- to enable the peripheral to change the length of the data 

cation became available in the first quarter of 1995. All three block "on-the-fly". This greatly reduces MIP overhead, 

of these revisions are specifically and explicitly incorporated particularly with respect to ongoing data streams having 

herein by reference. 65 variable block sizes as are present in audio applications. 

The PCI bus can be populated with adapters requiring fast The disclosed ODT engine also includes a flexible inter- 
access to each other and/or system memory, and that can be rupting scheme to both the host and to the relevant periph- 
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eral agent. Moreover, a plurality of transfer modes are Data blocks 371-373 are siniilar blocks of data to be 

available, e.g., for transfer of data or code overlays. transferred, but from/to the alternate bank 352. The use of 

FIG. 1 shows a computer system including one or more two banks 350, 352 allow operation in a ping-pong fashion, 

peripherals having an on-demand transfer (ODT) engine in Preferably, to avoid conflicts, host and peripherals do not 
accordance with the principles of the present invention. 5 operate on both banks 350, 352 simxiltaneously. 

In particular, a typical computer system will include a host operation, the ODT engine generates a stream interrupt 

processor 106 and one or more peripheral devices 102-104. ^^e host whenever the ODT engine reaches the end of a 

In accordance with the disclosed embodiment, each periph- ^ substandally the same time, an entry is 

eral device 102-104 which will request the transfer of data written into the host stream interrupt queue 202 to initiate a 
will include an ODT engine 100, e.g., ODTs 100a and 1006 ^° service mterrupt. 

in the peripheral devices 104 and 102, respectively. f^G- 4 shows in more detail an on-demand transfer (ODT) 

The host 106 and the peripheral devices 102, 104 com- ^^^^ ^° ^ 

municate with one another over an industry standard PCI particular, as shown m HG. 4, the disclosed embodi- 

bus 140. Although the present invention has been described ^^^^ of an ODT engine 100 mcludes vanous sutus and 

with respect to an embodiment utilizing the PCI standard control registers 408, a stream request queue 402, a stream 

bus, the principles of the present invention are equaUy parameter table 404, and stream data storage 406. 

applicable to other bus standards, but particularly to a bus FfG. 5A sho.ws a stream request queue 402 in the ODT 

standard implementing burst communications. engine shown in FIG. 4. 

A memory block 110 is located somewhere in the PC 20 ^^""^ '"""^ ^" represents individual entries 

system, e.g., in the host 106. However, in accordance with 520-522, and the columns 502-510 represent the contents of 

the principles of the present invention, the memory block ^^^h 520-522. For instance, column 502 is a mask bit 

110 may be located anywhere accessible by the PCI bus 140, allow masking of the relevant interrupts. Column 504 is 

including in either of the peripheral devices 102, 104. f indicating whether or not the stream request is active 

inr- o .u^»r. Ki^^v (* 1') or inactive (*0*). Column 506 mdicates the direction of 

FIG. 2 shows the contents 01 an exemplary memory block 25 j , . <• V • * j 1 mo - j- . .u 

lift » „ tu^ Ur./* in/: <^ata transfer being requested. Column 508 mdicates the 

110 withm the FC system, e.g., m the host lUo, in accor- . <• i_i 1 r j . ^ • . r j l 

. ... . .| f.f size of the block of data being U*ansferred, e.g., the number 

dance with the principles of the present invention. - c . ^ ■ , -.Lj-i j 

^ , . , J J . . of words to be transferred. For instance, m the disclosed 

TTie memory block mcludes a dynamic stream interrupt embodiment, the actual number of words transferred is one 

queue 202, a buffer of from 1 to 128 stream pomters 204, and ^^^^ ^^^^ .^^.^^^^j ^ Column 510 
from 1 to 128 data stream cyclic buffers 206. 30 ^ ^^^^ ^^^^^^ 

HGS. 3Aand 3A(1) show an exemplary circular, dynamic -j^^ 5Q4 ^ ^ ^it in the stream request queue 402 

stream interrupt queue in the memory block shown in FIG. ^^^^^ represents the vaUdity of an entry. For instance, a flag 

^* bit 504 of *1* indicates a valid interrupt request, whereas a 

In particular, column 330 in FIG. 3Aindicates whether or flag bit 504 of is generated after the peripheral has 

not the entry in the dynamic stream interrupt queue 202 is a serviced the relevant intermpt and clears the flag bit 504. 

valid entry {'V) or an invalid entry CO*). Column 332 pjc 53 shows a stream parameter table in the ODT 

indicates the bank number which is to be transferred, e.g., ^^^^^^ ^qO shown in FIG. 4. The stream parameter table 404 

see FIG. 3C. Column 334 indicates a direction of the data g^Q^ pj^ 53 ^jjows three separate entries relating to 
transfer, e.g., a *0' indicates a transfer from a peripheral to ^ i^ree respective data streams. Each entry includes a set of 

the host, and a '1' indicates a transfer from host to the information relevant to where the daU is located both on the 

peripheral. Column 336 includes the status bits indicating peripheral side and the host side, 

the type of interrupt which is being activated lliese bits ^ -^^ ^ ^^^^^ ^^^^ ^ ^^^^^ 

relate to the host s perspective, and are preferably the sarne ^^^^ ^^O shown in HG. 4. Tbe stream data storage 

asthe six bits from the penpheralsp the ^^^^^ ^^j^^^^ ^ j^^^^ ^^^^ 3^^^^^ ^^3. 

ODT Stat register 716 of FIG. 7B. Column 338 represents f^^^j ^ ^ 

the stream number, i.e., channel number. ™^ j *i - i 

' FIG. 6 depicts vanous status and control registers imple- 

HG. 3A(1) IS a table showmg one exemplary implemen- ^^^^^ qDT engine 100 in the embodiment shown in 

tation of a host stream interrupt queue pomter register. pj^ 4 disclosed embodiment includes an ODT system 

FIG. 3B shows an exemplary stream pointer buffer in the jq and control register 602, an ODT transfer status and control 

memory block shown in HG. 2. Each entry 340-346 is a 32 register 604, a host and peripheral queue depth register 606, 

bit su-eam pointer indicating the cunent address of the ODT peripheral stream pointers register 608 including a periph- 

engine. Two 32-bit stream pointers 340, 342 or 344, 346 eral stream request pointer and a peripheral stream param- 

correspond to each data stream. Each 32-bit stream pointer eter table pointer, an ODT stream parameter table 610, and 
indicates the starting address in the host cyclic buffer 206. 55 the host interrupt pointer register 612. The ODT system and 

e.g., as shown in FIG. 3C. control register 602 is shown in more detail in FIGS. 7A, 

FIG.3Cshowsoneof up to 128 data stream cyclic buffers 7A(1) and 7A(2), the ODT transfer status and control 

in the memory block shown in FIG. 2, and is otherwise register 604 is shown in more detail in FIGS. 7B and 7B(1) 

known as a host cyclic buffer. Note, for instance, that the to 76(5), the host and peripheral queue depth register 606 is 
32-bit stream pointer 340 in the example of FIG. 3B go shown in more detail in RGS. 7C and 7C(1), the peripheral 

indicates the address of the top of bank 350 shown in FIG. stream pointers register 608 is shown in more detail in FIGS. 

3C. 7D, 7D(1) and 7D(2), the ODT stream parameter Uble 610 

The entries 361-363 shown in FIG. 3C represent the is shown in more detail in FIGS. 7E and 7E(1) to 7E(5), and 

blocks of data being transferred. In operation, after, for the ODT*s host interrupt pointer register 612 is shown in 
example, data block 361 is transferred, the memory address 65 more detafl in FIGS. 7F, 7F(1) and 7F(2). 

of the starling address of the data block 362 is input into the A time-out event may be established with a programmable 

32-bit stream pointer 342 (FIG. 3B). ODT timer that is under host or peripheral control. Such a 
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timer would provide an automatic method of setting the requesting ODT engine 100 simply by reading a memory 

"Go" bit in the ODT Transfer status and control regisler 604, location in the host memory 110. 

e.g., every 1 usee to every 100 msec. The "Go" Bit may be If the ODT engine 100 has reached, e.g., a half buffer 

automatically cleared when the ODT has sequenced through mark H as shown in FIG. 3C, it will cause an entry to be 
one complete pass of the Stream Request Queue. 5 made in the host's dynamic stream interrupt queue 202 and 

nie ODT preferably enters an idle state (e.g., goes to wiU initiate an intemipl to the host (if the interrupt is 

^sleepO when the "Go" bit is deactivated. This provides the enabled). Hie entry 310-324 comprises the sUtus of the 

host and peripherals with a mechanism to determine whether interrupt 336 and the data stream ID 338. 

any ODT engine is actively transferring data or is idle. The ODT engine 100 will continue to monitor the SRQ 
Additionally, this scheme allows the relevant ODT transfer lO 402 until all the SRQ entries 520-524 are exhausted, 

rale to adjust dynamically to match stream bandwidth The requesting peripheral can request the transfer of a 

requirements at any given time, and also saves power by subsequent block of data by making another entry in the 

reducing the number of memory accesses. SRQ 402 and issuing a GO command 712 to the ODT engine 

inn 

FIG. 8 shows an operative flow of register information in 

the disclosed ODT engine constructed in accordance with accordance with the disclosed embodiment, buffer 

the principles of the present invention. pointers 204 (including the wrap-around of buffer pointers at 

A specific implementation of the various registers in the ^o^* ^ ^^^^^^^ *'y ^DT engine 100 

ODT engine 100 are described in the foUowing tables. It is without further mvolvement from the host, 

to be understood that the specific bits, sizes, addresses and Different modes can be established in the ODT engine 100 

other features of the registers and memory in or relating to based on the needs of the particular application. For 

the ODT engine 100 may be quite different from those instance, the ODT engine 100 can include a code download 

disclosed herein but remain covered by the principles of the ^^dt aUowing the transfer of up to 16 K words in a single 

present invention. block transfer, i.e., with one SRQ entry 520-524 and a single 

In operation, an agent or particular application will command 712. 

request a data stream transfer from the ODT engine 100 by The register definitions and operation of the ODT engine 

programming an entry 520-524 in the Stream Request 100 are described herein with respect to a modem and audio 

Queue (SRQ) 402. The disclosed SRQ entry 520-524 com- application. The ODT engine 100 has a wide-range of 

prises a block transfer size 508, a stream ID number 510, a applications, including but not limited to sample rate 

direction of transfer 506, a transfer request flag 504, and a conversion, off-loading bulk delays, dynamic coefficient 

host interrupt mask bit 502. downloading, in-place block processing schemes, and other 

The SRQ 402 preferably has a programmable depth and ^^^^^ ^^^^ transfers of data or program code, 

is completely relocateable within the memory space of the general, the disclosed ODT engine 100 supports data 

relevant peripheral via an SRQ base address register (not transfers of from 1 to 128 independent data streams. Each 

shown). ^^ta stream is associated with its own data storage buffer of, 

Each data stream identified by a stream ID number 510 in e.g., from 1 to 64 words. Each data stream storage block is 

the SRQ 402 has an associated Stream Parameter Table a word aligned boundary. 

(SPT) 404. The SPT 404 is initialized by the requesting Moreover, each data stream has its own host cyclic buffer 

peripheral or host to provide the start address 542 of the data 206 in the host memory 110. Each host cycUc buffer 350, 

block to be transferred, and the number of data blocks 540 352 (FIG. 3C) is programmable to be from 4 to 64K Bytes 

to be transferred. The SPT 404 is preferably located in the deep. Each host cyclic buffer 350, 352 can overlap, e.g., 

same memory map as the SRQ 402, and is also relocateable Direct-Sound memory allotments. 

within the respective memory maps of the host and/or Host applications can query each data stream and deter- 

peripheral. mine the current position being transferred within each 
Thus, any device requesting a data transfer inputs an entry 45 stream's host cyclic buffer 350, 352 without accessing the 

520-524 in the SRQ 402 and initializes a corresponding SPT registers of the ODT engine 100 and without involvement of 

404. Once the peripheral or host has initialized the relevant l^e peripheral supporting the memory. The current position 

data streams for block transfers, the ODT engine 100 will be can be determined to an accuracy of the number of words in 

given a 'GO' command 712 (FIG. 7B) by the requesting a block. 

peripheral or host to initiate the start of data transfer. 50 master accesses to the host system memory 110 will 

Upon receiving the GO command 712 via the ODT be 32-bit wide accesses with 26 bits of accuracy. The 

transfer status and control register 604 (FIG. 7B), the ODT beginning address of the host cyclic buffer 350, 352 of each 

engine 100 wiU monitor the SRQ 402 for a valid request. If <lata stream is on a 4 byte aligned boundary. In the disclosed 

a valid request is present in the SRQ 402, then the ODT embodiment, the ODT engine 100 resides within a 64 
engine 100 will fetch the corresponding SPT 404 for the data 55 ^^y^ system memory space. 

stream and complete the data transfer. The ODT engine 100 supports both WORD and DWORD 

Upon completion of the single block transfer, the ODT data size transfers across the PCI bus 140 to optimize 

engine 100 wiU update the SRQ entry 520-524 by resetUng throughput across the PCI bus 140. 

its transfer request flag in the ODT stream request queue The dynamic host stream interrupt queue (SIQ) 202 
entry 504, and will update the corresponding SPT entry 60 allows a host interrupt service routine (ISR) to indepen- 

520-524 with new pointers. After going through the entire dently service the ODT engine's interrupt for each data 

stream request queue, the ODT engine will reset its transfer stream. Entries in the dynamic host stream interrupt queue 

request flag in the ODT transfer status and control register 202 are updated by the relevant ODT engine 100. 

604. The ODT engine 100 will also update the host address The ODT engine 100 identifies which data stream is 
pointer 204 in the host memory 110 after each block transfer. 65 requesting a block transfer, and passes ODT status informa- 

Tbis is a useful feature and enables the host driver to query tion through each entry 520-524 in the stream request queue 

the current position of the buffer pointer 204 relating to the 402. 
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The ODT engine 100 requires low host MIP overhead in 
servicing the individual interrupts from the various ODT 
engines lOOa, 100b even when supporting large numbers of 
data stream transfers. 

The ODT engine 100 provides programmable depth con- 
trol for the dynamic stream interrupt queue 202 up to a 
maximum of, e.g., 256 word entries. The dynamic stream 
interrupt queue 202 allows the ODT engine 100 to recognize 
that the peripheral or host has requested one or more data 
blocks to be transferred. 

Entries in the stream request queue 402 preferably pro- 
vide sufficient information for the ODT engine 100 to i) 
identify the data stream block which has been requested for 
transfer; ii) identify the word size of the data stream block; 
and iii) identify the direction of transfer for the request. The 
entries 520-524 in the stream request queue 402 include a 
request flag bit 504 set by the requesting peripheral and 
monitored by the relevant ODT engine 100 to determine 
whether the previously requested data block has already 
been transferred. 

The interrupts to the host 106 are preferably indepen- 
dently maskable to allow the requesting peripheral to make 
multiple entries in the dynamic stream interrupt queue 202 
without requiring an actual interrupt to the host 106 to occur. 

The stream request queue 402 has programmable depth 
control to minimize the amount of RAM required for usage 
by the ODT engine 100. 

The ODT engine 100 supports a transparent transfer mode 
which allows the peripheral (e.g., including a DSP) to use 
host system memory as an extension of the peripheral's 
RAM block size without any involvement by the host 106. 

Preferably, the ODT engine 100 does not generate an 
entry to the dynamic stream interrupt queue 202, and does 
not generate an interrupt to the host 106. The ODT engine 
100 generates an interrupt to the peripheral when the periph- 
eral has reached the end of each host bank, which is half the 
host cyclic sUeam buffer as shown in FIG. 3C. This implies 
two interrupts to the peripheral, one for read (RX) transfers 
and the other for write (TX) transfers. 

The ODT engine 100 does not wait for the peripheral to 
respond to the interrupt. Instead, the interrupt to the periph- 
eral by the ODT engine 100 would be cleared by the 
peripheral via a read of an ODT engine interrupt status 
register. In the disclosed embodiment, the ODT engine's 
interrupt is double buffered to prevent the peripheral from 
missing an interrupt event. 

The ODT engine 100 allows the peripheral to control 
where in the data stream cyclic buffer 206 the transfer 
request is to occur. This implies that the peripheral can 
control, e.g., 26 bits of the current 32 bit stream pointer 204 
used during a block transfer. 

The ODT engine 100 supports the transfer of larger than 
64 continuous words per stream by allowing a transfer 
request for the transfer of multiple blocks (1 to 64 words 
each) without managing any peripheral or host address 
pointers. 

The ODT engine 100 includes an auto-increment flag bit 
which the peripheral would set once. This bit is used by the 
ODT engine 100 to indicate that the next peripheral address 
which will be used by the ODT engine 100 for the beginning 
of the next block transfer will be stored back into the 
peripheral's RAM as part of the Stream Parameter Table 
(SPT) 404. The default value of this auto-increment flag bit 
assumes that the peripheral is not using auto-increment 
mode, and that the peripheral is responsible for updating the 
peripheral's address if necessary. 
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Since this feature may be used to download agent code, 
e.g., DSP code, "on-the-fly'*, the peripheral requires an 
interrupt from the ODT engine 100 indicating that a set of 
multiple consecutive entries for a given data stream has been 
5 transferred. 

All data stream transfer information is preferably grouped 
per stream by the ODT engine 100 in a common area in 
memory, i.e., in the SPT 404. 

Each stream's block data storage area in memory is 
allowed to be allocated in independent, non-contiguous 
areas, i.e., stream data storage. Each stream's host cyclic 
buffer storage area is allocated in separate independent 
noncontiguous areas as well. 
^ J In accordance widi the principles of the present invention, 
the ODT registers for the ODT engine 100 of each peripheral 
device are distributed among the respective peripheral 
devices. Moreover, the ODT registers are accessible by the 
host or another peripheral. 
20 Preferably, in the ODT engine 100, maskable peripheral 
interrupts are established for the following: 
(a) When the ODT engine 100 has detected a collision 
with the host 106 due to the host not clearing the HI bit 
532 in the stream parameter table 404. 
25 (b) When the ODT engine 100 has completed a stream 
transfer and the ODT engine 100 passes a stream ID 
number 718 via the ODT transfer status and control 
register 604. This interrupt is preferably self-cleared 
when the peripheral reads the ODT transfer status and 
30 control register 604. 

(c) When the ODT engine 100 has detected a collision 
with the dynamic stream intermpt queue 202 via the 
MSB bit 330 (FIG. 3A) not being cleared. The host 106 
must service each stream's cyclic buffer 350, 352 
indicated by each entry in the dynamic stream interrupt 
queue 202, then clear the MSB bit 330 in the relevant 
entry to inform the ODT engine 100 that the host 106 
has completed the relevant cyclic buffer service 
request. 

(d) When the ODT engine 100 has detected a wait to 
access to the peripheral RAM 804, in which case the 
ODT engine 100 will generate an interrupt. This inter- 
rupt is preferably cleared by a read of the ODT transfer 
status and control register 604 by the peripheral. 

(e) When the ODT engine 100 has detected a PQ bus 
event that has caused a PCI bus latency counter to 
time-out, or a premature termination of a PCI bus 
master access, either of which causes a maskable 

50 interrupt. Preferably, this interrupt is cleared by a read 
of the ODT transfer statxis and control register 604 by 
the peripheral, 
(g) When an emergency ODT engine stop condition has 
occurred due to a mis-match of ODT's upper 6 Bits of 

55 the host interrupt queue pointer register 204 with the 
declared range of the host Interrupt queue pointer 
register 204. When this state has been detected, the 
ODT engine 100 will immediately halt and terminate a 
current block transfer, then cause a non-maskable (or 

60 maskable) interrupt to the peripherals and to the host 
106. 

A maskable interrupt may be generated for the host 106 
when the ODT engine 100 has completed one or a multiple 
number of stream's block transfer, and ODT engine 100 will 
65 generate a maskable interrupt to the host 106. This interrupt 
from ODT engine 100 is intended to be used by the host 106 
to manage specified stream's cyclic buffers 206. This inter- 
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rupt is cleared when the host 106 reads the relevant entry in maintaining a centralized data buffer in a host relating to 

the dynamic stream interrupt queue 202. one of a source and destination of each of said plurality 

Another maskable interrupt may be generated for the host of blocks of data to be transferred; 

106 when the ODT engine 100 has detected a collision with transferring each of said plurality of blocks of data over 

the dynamic stream interrupt queue 202 via its MSB bit 330 5 a separate one of said pluraUty of data U-ansfer channels 

not being cleared. The host 106 must service each stream's bas^ on said length of said plurality of blocks of data 

cycUc buffer 206 indicated by each entry in the dynamic estabhshed by each of said distributed plurality of data 

su-eam interrupt queue 202. then clear the MSB 774 (or other transter engmes, and 

designated bit) in the relevant entry to inform the ODT changmpaid lengthof said respective at 1^^^^ 

engine 100 that the host 106 has completed the relevant tont^^d hlT ' '^'''^^ 

cychc buffer service request. This interrupt is preferably « ^ ^ j rV c • i i r* rui i 

: ..Ta/t juj • 9. The method of transfernng a large plurahty of blocks 

cleared when Oie host 106 reads the dynamic stream mter- ^^^^ ^^^^^^^ ^^^^ ^^^^^^ ^^^^^^^ ^^^^^ 

rupt queue 202. ... claim 8, said method further comprising: 

-nius, m accordance with the prmaples of the present maintaining a centralized start address relating to a start- 

mvention, an efficient, high capacity, flexible, and distrib- ^^^^^^ ^ ^^^^ ^^^^ pl^^^li^y 

uted block data transfer system is provided. blocks of daU to be transferred separate from a storage 

While the invention has been described with reference to device for storing said lengths of said pluraUty of 

the exemplary embodiments thereof, those skilled in the art blocks of data. 

will be able to make various modifications to the described 20 10. The method of transferring a large plurality of blocks 

embodiments of the invention without departing from the of data over separate data transfer channels according to 

true spirit and scope of the invention. claim 8, wherein: 

What is claimed is: said data buffer is cychc. 

1. A block memory transfer module comprising: 11. The method of transferring a large plurality of blocks 

a Stan address for a block of memory to be transferred, °f .'•"^l "^P""'^ ''"'^f" '° 

.... rc.j - J claim 8, wherein: 

mamtamed m memory of a first device; and ^ ^ l 

. f J s^i^ ^ ^ ^^^^ 'yp^ ^^^^ transfer bus. 

a length of said block of memory to be transferred, 12. The method of transferring a large pluraUty of blocks 

mamtamed m memory of a second device separate of jaia over separate data transfer channels according to 
from said first device; 30 ^^^^ -q^ wherein: 

wherein said length of said block of memory to be said burst type data transfer bus is a PCI bus. 

U-ansferred is variable without requiring intervention by 13. The method of transferring a large pluraUty of blocks 

said first device. of data over separate data transfer channels according to 

2. The block memory transfer module according to claim claim 8, wherein : 

1, wherein: said large pluraUty is more than seven. 

said first device is a host. 14. Apparatus for transferring a large pluraUty of blocks 

3. The block memory transfer module according to claim of data over separate data transfer channels, said method 

2, wherein: comprising: 

said second device is a peripheral device including said 40 ^ plurality of data transfer means for transferring at least 

block of memory. one block of data, said pluraUty of data transfer means 

4. The block memory transfer module according to claim being distributed among a respective plurality of 
1 wherein* devices connected to a data bus, each data transfer 

' said second device is a peripheral device including said means including a length of a respective at least one of 

block of memory. 45 said plurality of blocks of data; 

5. The block memory transfer module according to claim cenlraUzed data buffer means maintained m a host for 

1, further comprising: containing one of a source and destination of each of 

j.* ruu. -jc^j - J said plurality of blocks of data to be transferred; 

a burst type data transfer bus between said first device and ^ r - 1. r -j , r^i 1 r 

said second device means for transfernng each of said pluraUty of blocks of 

6. The block memory'transfer module according to claim 50 data over a separate one of said pluraUty of data traxisfer 

5 wherein- channels based on said length of said pluraUty of blocks 
' . , . * , ^ ^ of data established by each of said distributed plurality 

said burst type data transfer bus is a Peripheral Compo- ^^^^ ^^^^^^^ engines; 

^ ^* J. . , • means for changing said length of said respective at least 

7. The block memory transfer module accordmg to clami - 11-^ rui 1 rj * fu . 

6 wherein- 55 oneof said pluraUty of blocks of data without requinng 
' intervention by said host. 

said first device is a host processor of a personal com- 15 apparatus for transferring a large pluraUty of 

puter; and blocks of data over separate data transfer channels according 

said second device is a peripheral in said personal com- to claim 14, further comprising: 
putcr. 60 means for maintaining a centraUzed start address relating 

8. A method of transferring a large plurality of blocks of to a starling address of a source of each of said pluraUty 
data over separate data transfer channels, said method com- of blocks of data to be transferred separate from a 
prising: storage device for storing said lengths of said plurality 

distributing a plurality of data transfer engines among of blocks of data. 

respective devices connected to a data bus, each data 65 16. The apparatus for transferring a large pluraUty of 

transfer engine including a length of a respective at blocks of data over separate data transfer channels according 

least one of said plurality of blocks of data; to claim 14, wherein: 
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said centralized data buffer means is cyclic. 

17. The apparatxis for transferring a large plurality of 
blocks of data over separate data transfer channels according 
to claim 14, wherein: 

said data bus is a biirst type data transfer bus. 

18. The apparatus for transferring a laige plurality of 
blocks of data over separate data transfer channels according 
to claim 17, wherein: 

said burst type data transfer bus is a PCI bus. 

19. The apparatxis for transferring a large plurality of 
blocks of data over separate data transfer channels according 
to claim 14, wherein: 

said large plurality is more than seven. 

20. A system adapted for transferring a large plurality of 
blocks of data over separate data transfer channels, said 
system comprising: 

a plurality of computer devices each comprising a respec- 
tive data transfer engine, each of said plurality of 
computer devices interconnected via a data bus, each 
data transfer engine including storage for a length of a 
respective at least one of said plurality of blocks of 
data; and 
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a host computer device including a centralized data buffer 
relating to one of a source and destination of each of 
said plurality of blocks of data to be transferred, said 
host computer device including a starting address of 
^ each of said plurality of blocks of data; 

wherein said length of said respective at least one of said 
plurality of blocks of data is variable without requiring 
intervention by said host computer device. 
21. The system adapted for transferring a large plurality of 
blocks of data over separate data transfer channels according 
to claim 20, wherein: 

said data bus is a PCI bus. 
15 22 . The system adapted for transferring a large plurality of 
blocks of data over separate data transfer channels according 
to claim 20, wherein: 

said data bus is a burst type data bus. 
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